The Actual Question
I was planning a trip to New York and needed a hotel near a specific address, but "near" meant "reachable within 20 minutes by subway, departing at 4:30 PM on a weekday." Google Maps can give you transit directions between two points; it cannot search for hotels filtered by transit reachability at a specific departure time. No hotel booking site can either.
So I built a pipeline using only open data: no Google APIs, no API keys (mostly), no monthly fees.
Hotels Within Range
The Overpass API queries OpenStreetMap for anything tagged as lodging within 5 km of the target address: 316 hotels in the initial results. For each, I needed walking distance to the nearest subway station and from the destination station to the target address. OSRM's Table API computes these in bulk: batches of 80 origin-destination pairs using actual street geometry, not straight-line distance.
Walking speed is set to 2 km/h (a conservative pace accounting for crosswalks, crowds, and luggage), with a circuity factor of 1.25 for Manhattan's grid.
Schedule-Aware Subway Routing
The MTA publishes its full subway schedule as a GTFS feed, a collection of CSVs describing stops, routes, trips, and departure times. The routing algorithm is a Dijkstra-like backward propagation from the target station: given a desired arrival time of 16:30, what's the latest train I can catch at each station to arrive on time? This accounts for transfer times, express vs. local service, and the actual schedule, not just headway averages.
One subtlety that took debugging: GTFS uses a station hierarchy. A "station" like Times Square has multiple child stops (one per platform, per direction), and the routing needs to handle parent-child relationships correctly or you get phantom transfers between platforms that don't physically connect.
The output is a reachability map: for each subway station, the total travel time (walking + waiting + riding) to the destination.
Scoring
Hotels beyond 20 minutes get dropped. The remaining candidates need price and quality. DuckDuckGo scraping provides approximate nightly rates and review scores (not perfectly reliable: some hotels don't show up, prices vary by date, but free and unauthenticated). For a more nuanced signal, I sent hotel names and descriptions to an LLM via the Poe API, asking it to score cleanliness reputation, noise level, and neighborhood safety. Imprecise, but it adds a dimension that star ratings don't capture.
The Scatterplot
The final output is an interactive scatterplot: price vs. composite value score, points colored by whether the hotel is reachable by walking or subway. The Pareto frontier, hotels where you can't improve price without sacrificing quality or vice versa, is immediately visible.
Of the original 316 hotels, 187 were reachable within 20 minutes by transit. A sorted list forces a single ordering; the scatterplot shows the full tradeoff space and lets you decide what matters. A hotel that's "15 minutes away" at noon might be 25 at 4:30 PM if the express doesn't run; schedule-aware routing changed which hotels made the shortlist in ways I hadn't expected.
The pipeline is NYC-specific (MTA subway) but the architecture generalizes to any city with a GTFS feed: Berlin, London, and Tokyo all publish transit data in the same format.